Wideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis
نویسندگان
چکیده
Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. However, developing a high fidelity full band sinusoidal model that preserves its high quality on speech transformation still remains an open research problem. Such a system can be extremely useful for high quality speech synthesis. In this paper we present an enhanced harmonic model representation for voiced/mixed wide band speech that is capable of high quality speech reconstruction and transformation in the parametric domain. Two key elements of the proposed model are a proper phase alignment and a decomposition of a speech frame to "deterministic" and dense "stochastic" harmonic model representations that can be separately manipulated. The coupling of stochastic harmonic representation with the deterministic one is performed by means of intra-frame periodic energy envelope, estimated at analysis time and preserved during original/transformed speech reconstruction. In addition, we present a compact representation of the stochastic harmonic component, so that the proposed model has less parameters than the regular full band harmonic model, with better Signal to Reconstruction Error performance. On top of that, the improved phase alignment of the proposed model provides better phase coherency in transformed speech, resulting in better quality of speech transformations. We demonstrate the subjective and objective performance of the new model on speech reconstruction and pitch modification tasks. Performance of the proposed model within unit selection TTS is also presented.
منابع مشابه
A Hybrid Quasi-Harmonic/CELP Wideband Speech Coding Scheme for Unit Selection TTS Synthesis
This paper suggests a new wideband speech coding model to efficiently compress acoustic inventories for concatenative unit selection text-to-speech (TTS) synthesis system. To fulfill the requirements of TTS synthesizer such as partial segment decoding and random access capability, a non-predictive scheme was adopted which combines the adaptive Quasi-Harmonic Model (aQHM) with the innovative cod...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملApproximate Kalman Filtering for the Harmonic plus Noise Model
We present a probabilistic description of the Harmonic plus Noise Model (HNM) for speech signals. This probabilistic formulation permits Maximum Likelihood (ML) parameter estimation and speech synthesis becomes a straightforward sampling from a distribution. It also permits development of a Kalman filter that tracks model parameters such as pitch, harmonic amplitudes, and autoregressive coeffic...
متن کاملApplying the harmonic plus noise model in concatenative speech synthesis
This paper describes the application of the harmonic plus noise model (HNM) for concatenative text-to-speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of a speech signal into these two components allows for more natural-sounding modifications of the signal (e.g., by using differ...
متن کاملConcatenative speech synthesis using a harmonic plus noise model
This paper describes the application of the Harmonic plus Noise Model, HNM, for concatenative Text-to-Speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of speech signal in these two components allows for more natural-sounding modi cations (e.g., source and lter modi cations) of t...
متن کامل